update: nvm representation #68

fia0 · 2025-02-24T16:38:31Z

I've worked a bit on the previous draft (#63) of the nvm tree variant of haura's nodes to improve performance. This is the current version of this endeavour. Performance is improved on the new variant of the tree as well as the old "block" variant due to various performance fixes in the entirety of the storage engine. Additionally, StorageKinds are introduced to map tiers to storage media types. Oh, and we allow for yaml configs because i don't want to foo: {{{{{[[{ option: null }]]}}}}} anymore.

Also some crucial bugs are fixed related to the node balancing in haura. This PR arguably fixes some problems which have existed some time related to the epsilon of our b^epsilon.

Compression still needs to be integrated with this variant of the tree as well as some safety checks.

…eing, an alternative approach is used to verify the node size and structure.

Introduced by rebase.

Internal fragmentation made this necessary with the smaller cache size for key-value store tests.

They are meant to allow for nodes to do their own integrity check like internal checksumming on singular entries. Analagous this can be done for compression.

The other one was just silly.

So for quite some time sequential insertion constructed a tree which did not really adhere to the bepsilon-tree rules. This was due to the nodes-in-cache optimization in the insertion code which skips insertion into nodes when their child nodes are in cache. This lead to the case that on sequence many leaves where created and all the pivots are inserted into the parent node of the last node in cache, this was never checked bc we only call rebalance on the final node which was the last node in cache. Now bc of this these parents grew without checks and pivots were essentially just glued together. First, this slows down searching in the node. Second, all access guarantees and buffer spaces normally allowed in the bepsilon tree are gone and with only pivots our tree essentially behaved like a btree in these scenarios. Why this was never caught before i don't know but this commit fixes this behavior doing two things: 1. The `is_too_large` of the node objects now include this space devision of at maximum B^epsilon space for pivots. Meaning as soon as nodes overstep this boundary they are split to adhere to bepsilon-tree construction but might be smaller than 4m, 1m, whatever. This has implication on performance (positive and negative) but is the correct thing to do. 2. Before we check if the child of the current node is in cache and can be modified we check if the current node is already too large if this is the case we DO NOT SKIP THE CURRENT NODE but instead insert the message into the current internal node. This causes more operations on insertion but also makes future updates as cheap as they are actually expected to be with the complexity of the bepsi tree. In the context of this: Another bug was fixed which highlights how problematic this behavior was, the `get_with_info` code of the node was not able to fetch an entry when it was not present in the leaves. Due to the bug when constructing the tree sequentially this was not caught somehow before. It is fixed now.

used the absolute storage size instead of cache size

SajadKarim and others added 30 commits January 29, 2025 12:04

temp checkin

35e2a69

temp checkin

7c3540d

temp checkin

150e75d

temp checkin

71aa152

temp checkin

475488c

temp checkin

2bd71ec

temp checkin

3049e6a

push unfinished changes to compare code with the main branch.

d099fa0

Fix some issues.

cf2bd22

Resolve some compilation issues.

49c1464

Bug fix in-progress.

fd12fcf

Bug fix is still in progress.

2837d3b

Save the changes made thus far.

3d8127f

Save the changes made thus far.

2a53dc6

temp checkin

0ad3129

Add changes to the unit tests that are related to NVM. For the time b…

a92102e

…eing, an alternative approach is used to verify the node size and structure.

Move changes related to reading individual entries from NVM.

ce98d94

Remove some unnecessary changes.

4054cd7

NVM-optimized Bepsilon tree for Haura.

8003b75

cow_bytes: optimize archive impl

961bbd0

dmu: only copy metadata from nvm nodes

f5b331e

dmu: fix buf error

49e5b0f

Introduced by rebase.

tree: prepare partial reading of nvm leaf nodes

29e9c44

tests: add test for basic key-value tests

bfa8a84

tests: update snapshots

3592ded

tests: increase test db size for pivot key

5cb58ea

Internal fragmentation made this necessary with the smaller cache size for key-value store tests.

tree: adjust nvm leaf impl

eea86c5

tree: improve leaf layout

88b5b17

tree: shorten nvmleaf test names

01a44da

tree: segment private nvmleaf packing impl

36ae1b2

fia0 and others added 30 commits January 29, 2025 12:24

vdev: avoid copies

7df7743

dmu: introduce integrity modes

d8f852d

They are meant to allow for nodes to do their own integrity check like internal checksumming on singular entries. Analagous this can be done for compression.

size: add separate cache size

7963c87

dmu: avoid more copies

1dc69ba

buffer: fix dealloc on owned buffers

8cca20f

buffer: assert self owned buffers in BufWrite

5f8739f

betree: add get object size to c interface

41bb843

fio: check if database can be used without prefilling

8ac2263

fio: line break on error

df25559

betree: return object size in ulonglong via c interface

5d1b075

The other one was just silly.

bectl: add yaml config

2f1d252

tmp

ac62145

tree: remove memory copies copyless

08c43eb

tree: flush on edge cases

5205112

tree: fix copyless double ended iter

1d200a8

tree: propagate cache size change for copyless leafs

80abb5d

bench: sort runs in plots by max

0698652

bench: add ycsb A & B

90a05e3

tree: transfer raw buf to raw sliced cow bytes

d705feb

tree: replace leaf with packed buffer

309efcf

tree: fix high fanout for copyless internal

c6fceb6

tree: use correct sizes on removal and update

0920afa

used the absolute storage size instead of cache size

tree: fix object cache size updates

016524a

tree: fix cache size propagation

30f8f4c

tree: better storage map default

26dabf4

tree: quick fix for size_delta for child buffer in internal node

a58a20d

tree: remove old size const

eaa4ccb

tree: remove single size counter

d352911

fio: reset error on failed reinit

087374e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update: nvm representation #68

update: nvm representation #68

fia0 commented Feb 24, 2025

update: nvm representation #68

Are you sure you want to change the base?

update: nvm representation #68

Conversation

fia0 commented Feb 24, 2025